Defines an extractor for various type of content from PDF pages.
Initializes a new PDFContentExtractor object.
Initializes a new PDFContentExtractor object.
Extracts the colorspaces that exist in the page resources.
Extracts the page content stream as a list of graphic operators with their operands.
Extracts the information related to the images displayed on the page.
Extracts the content of an optional content group.
Extracts the text from the PDF page.
Extracts the text from the PDF page.
Extracts the text from the PDF page as a collection of Objects.
Extracts the text from the PDF page as a collection of Objects.
Extracts the text fragments from the PDF page.
Extracts the text fragments from the PDF page.
Extracts the page content as a list of visual Objects.
Extracts the page content as a list of visual Objects.
Extracts the page content as a list of visual Objects.
Extracts the text from the PDF page as a collection of words.
Extracts the text from the PDF page as a collection of words.
Gets the cmap factory.
Searches the page content for the specified text.
Searches the page content for the specified text.
Searches the page content for the specified text.
Sets the cmap factory.